Uploaded image for project: 'Spark'
  1. Spark
  2. SPARK-3424

KMeans Plus Plus is too slow

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 1.0.2
    • 1.3.0
    • MLlib
    • None

    Description

      The KMeansPlusPlus algorithm is implemented in time O( m k^2), where m is the rounds of the KMeansParallel algorithm and k is the number of clusters.

      This can be dramatically improved by maintaining the distance the closest cluster center from round to round and then incrementally updating that value for each point. This incremental update is O(1) time, this reduces the running time for K Means Plus Plus to O( m k ). For large k, this is significant.

      Attachments

        Issue Links

          Activity

            People

              derrickburns Derrick Burns
              derrickburns Derrick Burns
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: